## # A tibble: 19 x 2
## Q29 n
## <dbl> <int>
## 1 13 915
## 2 NA 771
## 3 5 745
## 4 4 636
## 5 8 351
## 6 3 180
## 7 11 173
## 8 16 151
## 9 10 128
## 10 7 119
## 11 17 114
## 12 1 84
## 13 12 71
## 14 18 69
## 15 6 55
## 16 2 21
## 17 14 13
## 18 9 6
## 19 15 3
## # A tibble: 5 x 2
## Q32 n
## <dbl> <int>
## 1 1 1038
## 2 2 1122
## 3 3 1121
## 4 4 702
## 5 NA 622
## # A tibble: 5 x 2
## Q32 n
## <dbl> <int>
## 1 1 72
## 2 2 81
## 3 3 80
## 4 4 75
## 5 NA 463
## # A tibble: 13 x 2
## Q33 n
## <dbl> <int>
## 1 1 1004
## 2 2 63
## 3 3 143
## 4 4 904
## 5 5 60
## 6 6 68
## 7 7 66
## 8 8 252
## 9 9 419
## 10 10 420
## 11 11 224
## 12 12 388
## 13 NA 594
## # A tibble: 13 x 2
## Q33 n
## <dbl> <int>
## 1 1 947
## 2 4 835
## 3 10 399
## 4 9 382
## 5 12 351
## 6 8 237
## 7 11 199
## 8 NA 133
## 9 3 119
## 10 6 62
## 11 5 57
## 12 7 57
## 13 2 56
## # A tibble: 6 x 2
## Q37 n
## <dbl> <int>
## 1 1 2985
## 2 2 1016
## 3 3 29
## 4 4 44
## 5 1234 1
## 6 NA 530
Drop NAs for specific questions and filter out disciplines with fewer than 30 (the cutoff) students in sample
## # A tibble: 18 x 2
## Q29 n
## <dbl> <int>
## 1 13 644
## 2 5 540
## 3 4 443
## 4 8 243
## 5 16 120
## 6 3 111
## 7 11 111
## 8 7 95
## 9 10 95
## 10 17 81
## 11 1 70
## 12 12 57
## 13 18 42
## 14 6 35
## 15 2 16
## 16 14 8
## 17 9 3
## 18 15 1
## # A tibble: 14 x 2
## Q29 n
## <dbl> <int>
## 1 13 644
## 2 5 540
## 3 4 443
## 4 8 243
## 5 16 120
## 6 3 111
## 7 11 111
## 8 7 95
## 9 10 95
## 10 17 81
## 11 1 70
## 12 12 57
## 13 18 42
## 14 6 35
Major counts and percentages
| Q29 | major | n | pct_total | cumulat_pct |
|---|---|---|---|---|
| 13 | Mec | 644 | 23.97 | 23.97 |
| 5 | Che | 540 | 20.10 | 44.06 |
| 4 | Civ | 443 | 16.49 | 60.55 |
| 8 | Ele | 243 | 9.04 | 69.59 |
| 16 | Softw | 120 | 4.47 | 74.06 |
| 3 | Bio | 111 | 4.13 | 78.19 |
| 11 | Ind | 111 | 4.13 | 82.32 |
| 7 | Comp | 95 | 3.54 | 85.86 |
| 10 | Env/Eco | 95 | 3.54 | 89.39 |
| 17 | Str/Arc | 81 | 3.01 | 92.41 |
| 1 | Aer/Oce | 70 | 2.61 | 95.01 |
| 12 | Mat | 57 | 2.12 | 97.13 |
| 18 | Gen | 42 | 1.56 | 98.70 |
| 6 | Con | 35 | 1.30 | 100.00 |
Gender counts overall
| Q37 | n | pct_total |
|---|---|---|
| 1 | 1973 | 73.43 |
| 2 | 678 | 25.23 |
| 3 | 15 | 0.56 |
| 4 | 21 | 0.78 |
fill in 0s for NAs for specific items (Q1, Q3, Q5)
Drop majors with low counts (below 30 students in sample)
## # A tibble: 14 x 2
## Q29 n
## <dbl> <int>
## 1 13 618
## 2 5 523
## 3 4 433
## 4 8 240
## 5 16 118
## 6 3 110
## 7 11 108
## 8 7 93
## 9 10 90
## 10 17 78
## 11 1 70
## 12 12 55
## 13 18 40
## 14 6 32
## # A tibble: 14 x 2
## major n
## <chr> <int>
## 1 Mec 618
## 2 Che 523
## 3 Civ 433
## 4 Ele 240
## 5 Softw 118
## 6 Bio 110
## 7 Ind 108
## 8 Comp 93
## 9 Env/Eco 90
## 10 Str/Arc 78
## 11 Aer/Oce 70
## 12 Mat 55
## 13 Gen 40
## 14 Con 32
Drop majors with NA as major
## # A tibble: 14 x 2
## major n
## <chr> <int>
## 1 Mec 618
## 2 Che 523
## 3 Civ 433
## 4 Ele 240
## 5 Softw 118
## 6 Bio 110
## 7 Ind 108
## 8 Comp 93
## 9 Env/Eco 90
## 10 Str/Arc 78
## 11 Aer/Oce 70
## 12 Mat 55
## 13 Gen 40
## 14 Con 32
First perform dimension reduction using UMAP
## NULL
## [,1] [,2]
## [1,] 15.62332 -15.97910
## [2,] 11.12283 -18.44035
## [3,] -34.20802 18.18004
## [1] 15.623323 11.122833 -34.208024 -34.419584 6.551877 -34.582958
## [1] -15.97910 -18.44035 18.18004 18.60377 -23.96864 18.41222
Next, perform clustering with HDBSCAN
## HDBSCAN clustering for 2608 objects.
## Parameters: minPts = 120
## The clustering contains 6 cluster(s) and 241 noise points.
##
## 0 1 2 3 4 5 6
## 241 1080 575 175 147 264 126
##
## Available fields: cluster, minPts, cluster_scores, membership_prob,
## outlier_scores, hc
Join the dataframes back together again
## # A tibble: 7 x 2
## cluster n
## <dbl> <int>
## 1 1 1080
## 2 2 575
## 3 5 264
## 4 0 241
## 5 3 175
## 6 4 147
## 7 6 126
## # A tibble: 7 x 3
## cluster_time_rank cluster cluster_avg
## <int> <dbl> <dbl>
## 1 1 1 0.386
## 2 2 4 3.46
## 3 3 0 4.76
## 4 4 5 5.14
## 5 5 6 7.13
## 6 6 2 9.24
## 7 7 3 17.4
Set clustering colors for all plots
Same information but faceted with clusters
** This is a good plot for seeing that a cluster’s beliefs about effects of global warming on different populations at different times vary in a clear pattern
Broken plot
## # A tibble: 41,728 x 5
## student_id major cluster_time_rank Q4_item Q4_resp
## <int> <chr> <int> <chr> <dbl>
## 1 1 Ele 5 Q4a 4
## 2 1 Ele 5 Q4b 3
## 3 1 Ele 5 Q4c 3
## 4 1 Ele 5 Q4d 2
## 5 1 Ele 5 Q4e 4
## 6 1 Ele 5 Q4f 2
## 7 1 Ele 5 Q4g 2
## 8 1 Ele 5 Q4h 4
## 9 1 Ele 5 Q4i 4
## 10 1 Ele 5 Q4j 1
## # ... with 41,718 more rows
| Q4_item | Q4_item_name | statistic | p.value | parameter | method |
|---|---|---|---|---|---|
| Q4a | Make money | 27.77257 | 0.2697493 | 24 | Pearson’s Chi-squared test |
| Q4b | Fame | 23.05203 | 0.5167281 | 24 | Pearson’s Chi-squared test |
| Q4c | Help others | 59.77527 | 0.0000687 | 24 | Pearson’s Chi-squared test |
| Q4d | Supervise others | 24.54317 | 0.4309141 | 24 | Pearson’s Chi-squared test |
| Q4e | Job sec. and opp. | 41.83517 | 0.0134640 | 24 | Pearson’s Chi-squared test |
| Q4f | Work w/ people | 49.69039 | 0.0015501 | 24 | Pearson’s Chi-squared test |
| Q4g | Invent/design | 33.19928 | 0.0999380 | 24 | Pearson’s Chi-squared test |
| Q4h | Develop knowledge/skill | 30.05447 | 0.1829529 | 24 | Pearson’s Chi-squared test |
| Q4i | Personal/fam. time | 30.50092 | 0.1686997 | 24 | Pearson’s Chi-squared test |
| Q4j | Easy job | 72.65467 | 0.0000009 | 24 | Pearson’s Chi-squared test |
| Q4k | Exciting env. | 29.78337 | 0.1920356 | 24 | Pearson’s Chi-squared test |
| Q4l | Solve societal prob. | 94.89931 | 0.0000000 | 24 | Pearson’s Chi-squared test |
| Q4m | Use talent/abilities | 37.59814 | 0.0381002 | 24 | Pearson’s Chi-squared test |
| Q4n | Do hands-on work | 27.20240 | 0.2951075 | 24 | Pearson’s Chi-squared test |
| Q4o | Apply math/sci. | 26.14879 | 0.3456586 | 24 | Pearson’s Chi-squared test |
| Q4p | Volunteer w/ charity | 78.86848 | 0.0000001 | 24 | Pearson’s Chi-squared test |
| Q4_item | Q4_item_name | statistic | p.value | parameter | method |
|---|---|---|---|---|---|
| Q4a | Make money | 12.819453 | 0.0121925 | 4 | Kruskal-Wallis rank sum test |
| Q4b | Fame | 2.078992 | 0.7212328 | 4 | Kruskal-Wallis rank sum test |
| Q4c | Help others | 25.037874 | 0.0000494 | 4 | Kruskal-Wallis rank sum test |
| Q4d | Supervise others | 2.835811 | 0.5856670 | 4 | Kruskal-Wallis rank sum test |
| Q4e | Job sec. and opp. | 12.838757 | 0.0120911 | 4 | Kruskal-Wallis rank sum test |
| Q4f | Work w/ people | 3.993174 | 0.4069304 | 4 | Kruskal-Wallis rank sum test |
| Q4g | Invent/design | 2.182131 | 0.7023020 | 4 | Kruskal-Wallis rank sum test |
| Q4h | Develop knowledge/skill | 7.608165 | 0.1070332 | 4 | Kruskal-Wallis rank sum test |
| Q4i | Personal/fam. time | 3.142947 | 0.5341950 | 4 | Kruskal-Wallis rank sum test |
| Q4j | Easy job | 5.616663 | 0.2296635 | 4 | Kruskal-Wallis rank sum test |
| Q4k | Exciting env. | 13.322136 | 0.0098045 | 4 | Kruskal-Wallis rank sum test |
| Q4l | Solve societal prob. | 74.868555 | 0.0000000 | 4 | Kruskal-Wallis rank sum test |
| Q4m | Use talent/abilities | 20.300904 | 0.0004355 | 4 | Kruskal-Wallis rank sum test |
| Q4n | Do hands-on work | 5.069584 | 0.2802320 | 4 | Kruskal-Wallis rank sum test |
| Q4o | Apply math/sci. | 1.734258 | 0.7844857 | 4 | Kruskal-Wallis rank sum test |
| Q4p | Volunteer w/ charity | 27.906034 | 0.0000130 | 4 | Kruskal-Wallis rank sum test |
## # A tibble: 7 x 3
## cluster_time_rank n avg_Q5_total
## <int> <int> <dbl>
## 1 1 1080 3.68
## 2 5 126 3.21
## 3 2 147 3.18
## 4 4 264 2.98
## 5 3 241 2.97
## 6 6 575 2.71
## 7 7 175 2.06
| Q5_item | Q5_item_name | statistic | p.value | parameter | method |
|---|---|---|---|---|---|
| Q5a | Energy (supply/demand) | 13.803697 | 0.0319075 | 6 | Pearson’s Chi-squared test |
| Q5b | Disease | 7.383852 | 0.2868020 | 6 | Pearson’s Chi-squared test |
| Q5c | Poverty and wealth dist. | 52.194355 | 0.0000000 | 6 | Pearson’s Chi-squared test |
| Q5d | Climate change | 203.090564 | 0.0000000 | 6 | Pearson’s Chi-squared test |
| Q5e | Terrorism and war | 9.742207 | 0.1359365 | 6 | Pearson’s Chi-squared test |
| Q5f | Water supply | 27.878110 | 0.0000991 | 6 | Pearson’s Chi-squared test |
| Q5g | Food availability | 30.949248 | 0.0000259 | 6 | Pearson’s Chi-squared test |
| Q5h | Opp. for future gen | 15.781208 | 0.0149778 | 6 | Pearson’s Chi-squared test |
| Q5i | Opp. for women and/or min. | 85.960121 | 0.0000000 | 6 | Pearson’s Chi-squared test |
| Q5j | Environmental degradation | 117.482450 | 0.0000000 | 6 | Pearson’s Chi-squared test |